Short text classification applied to item description: Some methods evaluation
نویسندگان
چکیده
The increasing demand for information classification based on content in the age of social media and e-commerce has led to need automated product using their descriptions. This study aims evaluate various techniques this task, with a focus descriptions written Portuguese. A pipeline is implemented preprocess data, including lowercasing, accent removal, unigram tokenization. bag words method then used convert text into numerical five are applied: argmaxtf, argmaxtfnorm, argmaxtfidf from retrieval, two machine learning methods logistic regression support vector machines. performance each technique evaluated simple accuracy via thirty-fold cross validation. results show that achieves highest mean among techniques.
منابع مشابه
An evaluation of text classification methods for literary study
This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı̈ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were a...
متن کاملEvaluating Text Clustering Methods for Text Classification
In this project report, I will evaluate the several text clustering approaches and how they can be used for the purpose of text classification. The particular task is topic classification of 20 Newsgroup dataset and sentiment classification restaurant reviews dataset. Future direction for improving the results will also be discussed.
متن کاملA Redundant Covering Algorithm Applied to Text Classification
Covering algorithms for learning rule sets tend toward learning concise rule sets based on the training data. This bias may not be appropriate in the domain of text classification due to the large number of informative features these domains typically contain. We present a basic covering algorithm, DAIRY, that learns unordered rule sets, and present two extensions that encourage the rule learne...
متن کاملText Classification with Tournament Methods
This paper compares the effectiveness of n-way (n > 2) classification using a probabilistic classifier to the use of multiple binary probabilistic classifiers. We describe the use of binary classifiers in both Round Robin and Elimination tournaments, and compare both tournament methods and n-way classification when determining the language of origin of speakers (both native and non-native Engli...
متن کاملTransductive LSI for Short Text Classification Problems
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI’s singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Semina
سال: 2022
ISSN: ['1676-5435', '1679-0367']
DOI: https://doi.org/10.5433/1679-0375.2022v43n2p189